Remove useless arguments in ReadCheckpointRecord().

  • Jump to comment-1
    masao.fujii@oss.nttdata.com2022-07-20T14:50:44+00:00
    Hi, I'd like to propose to remove "whichChkpt" and "report" arguments in ReadCheckpointRecord(). "report" is obviously useless because it's always true, i.e., there are two callers of the function and they always specify true as "report". "whichChkpt" indicates where the specified checkpoint location came from, pg_control or backup_label. This information is used to log different messages as follows. switch (whichChkpt) { case 1: ereport(LOG, (errmsg("invalid primary checkpoint link in control file"))); break; default: ereport(LOG, (errmsg("invalid checkpoint link in backup_label file"))); break; } return NULL; ... switch (whichChkpt) { case 1: ereport(LOG, (errmsg("invalid primary checkpoint record"))); break; default: ereport(LOG, (errmsg("invalid checkpoint record"))); break; } return NULL; ... But the callers of ReadCheckpointRecord() already output different log messages depending on where the invalid checkpoint record came from. So even if ReadCheckpointRecord() doesn't use "whichChkpt", i.e., use the same log message in both pg_control and backup_label cases, users can still identify where the invalid checkpoint record came from, by reading the log message. Also when whichChkpt = 0, "primary checkpoint" is used in the log message and sounds confusing because, as far as I recall correctly, we removed the concept of primary and secondary checkpoints before. Therefore I think that it's better to remove "whichChkpt" argument in ReadCheckpointRecord(). Patch attached. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
    • Jump to comment-1
      bharath.rupireddyforpostgres@gmail.com2022-07-20T15:29:20+00:00
      On Wed, Jul 20, 2022 at 8:21 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > Hi, > > I'd like to propose to remove "whichChkpt" and "report" arguments in ReadCheckpointRecord(). "report" is obviously useless because it's always true, i.e., there are two callers of the function and they always specify true as "report". Yes, the report parameter is obvious to delete. The commit 1d919de5eb removed the only call with the report parameter as false. > "whichChkpt" indicates where the specified checkpoint location came from, pg_control or backup_label. This information is used to log different messages as follows. > > switch (whichChkpt) > { > case 1: > ereport(LOG, > (errmsg("invalid primary checkpoint link in control file"))); > break; > default: > ereport(LOG, > (errmsg("invalid checkpoint link in backup_label file"))); > break; > } > return NULL; > ... > switch (whichChkpt) > { > case 1: > ereport(LOG, > (errmsg("invalid primary checkpoint record"))); > break; > default: > ereport(LOG, > (errmsg("invalid checkpoint record"))); > break; > } > return NULL; > ... > > But the callers of ReadCheckpointRecord() already output different log messages depending on where the invalid checkpoint record came from. So even if ReadCheckpointRecord() doesn't use "whichChkpt", i.e., use the same log message in both pg_control and backup_label cases, users can still identify where the invalid checkpoint record came from, by reading the log message. > > Also when whichChkpt = 0, "primary checkpoint" is used in the log message and sounds confusing because, as far as I recall correctly, we removed the concept of primary and secondary checkpoints before. Yes, using "primary checkpoint" confuses, as we usually refer primary in the context of replication and HA. > Therefore I think that it's better to remove "whichChkpt" argument in ReadCheckpointRecord(). > > Patch attached. Thought? How about we transform the following messages into something like below? (errmsg("could not locate a valid checkpoint record"))); after ReadCheckpointRecord() for control file cases to "could not locate valid checkpoint record in control file" (errmsg("could not locate required checkpoint record"), after ReadCheckpointRecord() for backup_label case to "could not locate valid checkpoint record in backup_label file" The above messages give more meaningful and unique info to the users. May be unrelated, IIRC, for the errors like ereport(PANIC, (errmsg("could not locate a valid checkpoint record"))); we wanted to add a hint asking users to consider running pg_resetwal to fix the issue. The error for ReadCheckpointRecord() in backup_label file cases, already gives a hint errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" ...... Adding the hint of running pg_resetwal (of course with a caution that it can cause inconsistency in the data and use it as a last resort as described in the docs) helps users and support engineers a lot to mitigate the server down cases quickly. Regards, Bharath Rupireddy.
      • Jump to comment-1
        masao.fujii@oss.nttdata.com2022-07-21T02:45:23+00:00
        On 2022/07/21 0:29, Bharath Rupireddy wrote: > How about we transform the following messages into something like below? > > (errmsg("could not locate a valid checkpoint record"))); after > ReadCheckpointRecord() for control file cases to "could not locate > valid checkpoint record in control file" > (errmsg("could not locate required checkpoint record"), after > ReadCheckpointRecord() for backup_label case to "could not locate > valid checkpoint record in backup_label file" > > The above messages give more meaningful and unique info to the users. Agreed. Attached is the updated version of the patch. Thanks for the review! > May be unrelated, IIRC, for the errors like ereport(PANIC, > (errmsg("could not locate a valid checkpoint record"))); we wanted to > add a hint asking users to consider running pg_resetwal to fix the > issue. The error for ReadCheckpointRecord() in backup_label file > cases, already gives a hint errhint("If you are restoring from a > backup, touch \"%s/recovery.signal\" ...... Adding the hint of running > pg_resetwal (of course with a caution that it can cause inconsistency > in the data and use it as a last resort as described in the docs) > helps users and support engineers a lot to mitigate the server down > cases quickly. +1 to add some hint messages. But I'm not sure if it's good to hint the use of pg_resetwal because, as you're saying, it should be used as a last resort and has some big risks like data loss, corruption, etc. That is, I'm concerned about that some users might quickly run pg_resetwal because hint message says that, without reading the docs nor understanding those risks. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
        • Jump to comment-1
          horikyota.ntt@gmail.com2022-07-21T05:54:49+00:00
          I agree to removing the two parameters. And agree to let ReadCheckpointRecord not conscious of the location source. At Thu, 21 Jul 2022 11:45:23 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > Agreed. Attached is the updated version of the patch. > Thanks for the review! - (errmsg("could not locate required checkpoint record"), + (errmsg("could not locate a valid checkpoint record in backup_label file"), "in backup_label" there looks *to me* need some verb.. By the way, this looks like a good chance to remove the (now) extra parens around errmsg() and friends. For example: - (errmsg("could not locate a valid checkpoint record in backup_label file"), + errmsg("could not locate checkpoint record spcified in backup_label file"), - (errmsg("could not locate a valid checkpoint record in control file"))); + errmsg("could not locate checkpoint record recorded in control file"))); + (errmsg("invalid checkpoint record"))); Is it useful to show the specified LSN there? + (errmsg("invalid resource manager ID in checkpoint record"))); We have a similar message "invalid resource manager ID %u at %X/%X". Since the caller explains that it is a checkpoint record, we can share the message here. + (errmsg("invalid xl_info in checkpoint record"))); (It is not an issue of this patch, though) I don't think this is appropriate for user-facing message. Counldn't we say "unexpected record type: %x" or something like? + (errmsg("invalid length of checkpoint record"))); We have "record with invalid length at %X/%X" or "invalid record length at %X/%X: wanted %u, got %u". Counld we reuse any of them? > > May be unrelated, IIRC, for the errors like ereport(PANIC, > > (errmsg("could not locate a valid checkpoint record"))); we wanted to > > add a hint asking users to consider running pg_resetwal to fix the > > issue. The error for ReadCheckpointRecord() in backup_label file > > cases, already gives a hint errhint("If you are restoring from a > > backup, touch \"%s/recovery.signal\" ...... Adding the hint of running > > pg_resetwal (of course with a caution that it can cause inconsistency > > in the data and use it as a last resort as described in the docs) > > helps users and support engineers a lot to mitigate the server down > > cases quickly. > > +1 to add some hint messages. But I'm not sure if it's good to hint > the use of pg_resetwal because, as you're saying, it should be used as > a last resort and has some big risks like data loss, corruption, > etc. That is, I'm concerned about that some users might quickly run > pg_resetwal because hint message says that, without reading the docs > nor understanding those risks. I don't think we want to recommend pg_resetwal to those who don't reach it by themselves by other means. I know of a few instances of people who had the tool (unrecoverably) break their own clusters. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
          • Jump to comment-1
            bharath.rupireddyforpostgres@gmail.com2022-07-25T06:39:26+00:00
            On Thu, Jul 21, 2022 at 11:24 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > > > May be unrelated, IIRC, for the errors like ereport(PANIC, > > > (errmsg("could not locate a valid checkpoint record"))); we wanted to > > > add a hint asking users to consider running pg_resetwal to fix the > > > issue. The error for ReadCheckpointRecord() in backup_label file > > > cases, already gives a hint errhint("If you are restoring from a > > > backup, touch \"%s/recovery.signal\" ...... Adding the hint of running > > > pg_resetwal (of course with a caution that it can cause inconsistency > > > in the data and use it as a last resort as described in the docs) > > > helps users and support engineers a lot to mitigate the server down > > > cases quickly. > > > > +1 to add some hint messages. But I'm not sure if it's good to hint > > the use of pg_resetwal because, as you're saying, it should be used as > > a last resort and has some big risks like data loss, corruption, > > etc. That is, I'm concerned about that some users might quickly run > > pg_resetwal because hint message says that, without reading the docs > > nor understanding those risks. > > I don't think we want to recommend pg_resetwal to those who don't > reach it by themselves by other means. I know of a few instances of > people who had the tool (unrecoverably) break their own clusters. Agree. We might want to take this topic separately as it needs more careful study of common issues such as PANICs and then adding hints with proven ways to repair the server and bring it back online. Regards, Bharath Rupireddy.
          • Jump to comment-1
            masao.fujii@oss.nttdata.com2022-07-22T02:50:14+00:00
            On 2022/07/21 14:54, Kyotaro Horiguchi wrote: > I agree to removing the two parameters. And agree to let > ReadCheckpointRecord not conscious of the location source. Thanks for the review! > At Thu, 21 Jul 2022 11:45:23 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> Agreed. Attached is the updated version of the patch. >> Thanks for the review! > > - (errmsg("could not locate required checkpoint record"), > + (errmsg("could not locate a valid checkpoint record in backup_label file"), > > "in backup_label" there looks *to me* need some verb.. Sorry, I failed to understand your point. Could you clarify your point? > By the way, > this looks like a good chance to remove the (now) extra parens around > errmsg() and friends. > > For example: > > - (errmsg("could not locate a valid checkpoint record in backup_label file"), > + errmsg("could not locate checkpoint record spcified in backup_label file"), > > - (errmsg("could not locate a valid checkpoint record in control file"))); > + errmsg("could not locate checkpoint record recorded in control file"))); Because it's recommended not to put parenthesis just before errmsg(), you mean? I'm ok to remove such parenthesis, but I'd like understand why before doing that. I was thinking that either having or not having parenthesis in front of errmsg() is ok, so there are many calls to errmsg() with parenthesis, in xlogrecovery.c. > + (errmsg("invalid checkpoint record"))); > > Is it useful to show the specified LSN there? Yes, LSN info would be helpful also for debugging. I separated the patch into two; one is to remove useless arguments in ReadCheckpointRecord(), another is to improve log messages. I added LSN info in log messages in the second patch. > + (errmsg("invalid resource manager ID in checkpoint record"))); > > We have a similar message "invalid resource manager ID %u at > %X/%X". Since the caller explains that it is a checkpoint record, we > can share the message here. +1 > + (errmsg("invalid xl_info in checkpoint record"))); > > (It is not an issue of this patch, though) I don't think this is > appropriate for user-facing message. Counldn't we say "unexpected > record type: %x" or something like? The proposed log message doesn't look like an improvement. But I have no better one. So I left the message as it is, in the patch, for now. > > + (errmsg("invalid length of checkpoint record"))); > > We have "record with invalid length at %X/%X" or "invalid record > length at %X/%X: wanted %u, got %u". Counld we reuse any of them? +1 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
            • Jump to comment-1
              horikyota.ntt@gmail.com2022-07-22T08:31:52+00:00
              At Fri, 22 Jul 2022 11:50:14 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > Sorry, I failed to understand your point. Could you clarify your > point? Wrote as a reply to Tom's message. > > By the way, > > this looks like a good chance to remove the (now) extra parens around > > errmsg() and friends. > > For example: > > - (errmsg("could not locate a valid checkpoint record in backup_label > > - file"), > > + errmsg("could not locate checkpoint record spcified in backup_label > > file"), > > - (errmsg("could not locate a valid checkpoint record in control file"))); > > + errmsg("could not locate checkpoint record recorded in control > > file"))); > > Because it's recommended not to put parenthesis just before errmsg(), > you mean? I'm ok to remove such parenthesis, but I'd like understand > why before doing that. I was thinking that either having or not having > parenthesis in front of errmsg() is ok, so there are many calls to > errmsg() with parenthesis, in xlogrecovery.c. I believed that it is recommended to move to the style not having the outmost parens. That style has been introduced by e3a87b4991. > * The extra parentheses around the auxiliary function calls are now > optional. Aside from being a bit less ugly, this removes a common > gotcha for new contributors, because in some cases the compiler errors > you got from forgetting them were unintelligible. ... > While new code can be written either way, code intended to be > back-patched will need to use extra parens for awhile yet. It seems > worth back-patching this change into v12, so as to reduce the window > where we have to be careful about that by one year. Hence, this patch > is careful to preserve ABI compatibility; a followup HEAD-only patch > will make some additional simplifications. So I thought that if we modify an error message, its ererpot can be rewritten. > > + (errmsg("invalid checkpoint record"))); > > Is it useful to show the specified LSN there? > > Yes, LSN info would be helpful also for debugging. > > I separated the patch into two; one is to remove useless arguments in > ReadCheckpointRecord(), another is to improve log messages. I added > LSN info in log messages in the second patch. Thanks! > > + (errmsg("invalid xl_info in checkpoint record"))); > > (It is not an issue of this patch, though) I don't think this is > > appropriate for user-facing message. Counldn't we say "unexpected > > record type: %x" or something like? > > The proposed log message doesn't look like an improvement. But I have > no better one. So I left the message as it is, in the patch, for now. Understood. regards -- Kyotaro Horiguchi NTT Open Source Software Center
              • Jump to comment-1
                masao.fujii@oss.nttdata.com2022-07-25T02:30:30+00:00
                On 2022/07/22 17:31, Kyotaro Horiguchi wrote: > I believed that it is recommended to move to the style not having the > outmost parens. That style has been introduced by e3a87b4991. I read the commit log, but I'm not sure yet if it's really recommended to remove extra parens even from the existing calls to errmsg(). Removing extra parens can interfere with back-patching of the changes around those errmsg(), can't it? Anyway, at first I pushed the 0001 patch that removes useless arguments in ReadCheckpointRecord(). Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
                • Jump to comment-1
                  tgl@sss.pgh.pa.us2022-07-25T02:40:16+00:00
                  Fujii Masao <masao.fujii@oss.nttdata.com> writes: > On 2022/07/22 17:31, Kyotaro Horiguchi wrote: >> I believed that it is recommended to move to the style not having the >> outmost parens. That style has been introduced by e3a87b4991. > I read the commit log, but I'm not sure yet if it's really recommended to remove extra parens even from the existing calls to errmsg(). Removing extra parens can interfere with back-patching of the changes around those errmsg(), can't it? Right, so I wouldn't be in a hurry to change existing calls. If you're editing an ereport call for some other reason, it's fine to remove the excess parens in it, because you're creating a backpatch hazard there anyway. But otherwise, I think such changes are make-work in themselves and risk creating more make-work for somebody else later. regards, tom lane
                  • Jump to comment-1
                    horikyota.ntt@gmail.com2022-07-26T00:42:23+00:00
                    At Sun, 24 Jul 2022 22:40:16 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Fujii Masao <masao.fujii@oss.nttdata.com> writes: > > On 2022/07/22 17:31, Kyotaro Horiguchi wrote: > >> I believed that it is recommended to move to the style not having the > >> outmost parens. That style has been introduced by e3a87b4991. > > > I read the commit log, but I'm not sure yet if it's really recommended to remove extra parens even from the existing calls to errmsg(). Removing extra parens can interfere with back-patching of the changes around those errmsg(), can't it? > > Right, so I wouldn't be in a hurry to change existing calls. If you're > editing an ereport call for some other reason, it's fine to remove the > excess parens in it, because you're creating a backpatch hazard there > anyway. But otherwise, I think such changes are make-work in themselves > and risk creating more make-work for somebody else later. So, I meant to propose to remove extra parens for errmsg()'s where the message string is edited. Is it fine in that criteria? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
                    • Jump to comment-1
                      masao.fujii@oss.nttdata.com2022-07-26T02:02:27+00:00
                      On 2022/07/26 9:42, Kyotaro Horiguchi wrote: > At Sun, 24 Jul 2022 22:40:16 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in >> Fujii Masao <masao.fujii@oss.nttdata.com> writes: >>> On 2022/07/22 17:31, Kyotaro Horiguchi wrote: >>>> I believed that it is recommended to move to the style not having the >>>> outmost parens. That style has been introduced by e3a87b4991. >> >>> I read the commit log, but I'm not sure yet if it's really recommended to remove extra parens even from the existing calls to errmsg(). Removing extra parens can interfere with back-patching of the changes around those errmsg(), can't it? >> >> Right, so I wouldn't be in a hurry to change existing calls. If you're >> editing an ereport call for some other reason, it's fine to remove the >> excess parens in it, because you're creating a backpatch hazard there >> anyway. But otherwise, I think such changes are make-work in themselves >> and risk creating more make-work for somebody else later. > > So, I meant to propose to remove extra parens for errmsg()'s where the > message string is edited. Is it fine in that criteria? Even in that case, removing parens may interfere with the back-patch. For example, please imagine the case where wasShutdown is changed to be set to true in the following code and this changed is back-patched to v15. If we modify only the log message in the following errmsg() and leave the parens around that, git cherry-pick of the change of wasShutdown to v15 would be completed successfully. But if we remove the parens, git cherry-pick would fail. ereport(FATAL, (errmsg("could not locate required checkpoint record"), errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n" "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n" "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.", DataDir, DataDir, DataDir))); wasShutdown = false; /* keep compiler quiet */ Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
            • Jump to comment-1
              tgl@sss.pgh.pa.us2022-07-22T03:10:04+00:00
              Fujii Masao <masao.fujii@oss.nttdata.com> writes: > On 2022/07/21 14:54, Kyotaro Horiguchi wrote: >> At Thu, 21 Jul 2022 11:45:23 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >>> - (errmsg("could not locate required checkpoint record"), >>> + (errmsg("could not locate a valid checkpoint record in backup_label file"), >> "in backup_label" there looks *to me* need some verb.. > Sorry, I failed to understand your point. Could you clarify your point? FWIW, the proposed change looks like perfectly good English to me. "locate" is the verb. It's been way too many years since high school grammar for me to remember the exact term for auxiliary clauses like "in backup_label file", but that doesn't need its own verb. Possibly Kyotaro-san is feeling that it should be like "... checkpoint record in the backup_label file". That'd be more formal, but in the telegraphic style that we prefer for primary error messages, omitting the "the" is fine. regards, tom lane
              • Jump to comment-1
                horikyota.ntt@gmail.com2022-07-22T08:17:12+00:00
                At Thu, 21 Jul 2022 23:10:04 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Fujii Masao <masao.fujii@oss.nttdata.com> writes: > > On 2022/07/21 14:54, Kyotaro Horiguchi wrote: > >> At Thu, 21 Jul 2022 11:45:23 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > >>> - (errmsg("could not locate required checkpoint record"), > >>> + (errmsg("could not locate a valid checkpoint record in backup_label file"), > > >> "in backup_label" there looks *to me* need some verb.. > > > Sorry, I failed to understand your point. Could you clarify your point? > > FWIW, the proposed change looks like perfectly good English to me. > "locate" is the verb. It's been way too many years since high > school grammar for me to remember the exact term for auxiliary > clauses like "in backup_label file", but that doesn't need its > own verb. Possibly Kyotaro-san is feeling that it should be > like "... checkpoint record in the backup_label file". That'd > be more formal, but in the telegraphic style that we prefer for > primary error messages, omitting the "the" is fine. Maybe a little different. I thought that a checkpoint record cannot be located in backup_label file. In other words what is in backup_label file is a pointer to the record and the record is in a WAL file. I'm fine with the proposed sentsnse if the it makes the correct sense. (And sorry for the noise) By the way, I learned that the style is called "telegraphic style". regards. -- Kyotaro Horiguchi NTT Open Source Software Center